Your software may offer one or more of the following goodness-of-fit measures:
A measure of agreement between the observed and predicted outcomes called concordance (see
the bottom of Figure 23-4). Concordance indicates the extent to which participants with higher
predicted hazard values had shorter observed survival times, which is what you’d expect. Figure
23-4 shows a concordance of 0.642 for this regression.
An r (or r2) value that’s interpreted like a correlation coefficient in ordinary regression, meaning
the larger the r2 value, the better the model fits the data. In Figure 23-4, r2 (labeled Rsquare) is
0.116.
A likelihood ratio test and associated p value that compares the full model, which includes all the
parameters, to a model consisting of just the overall baseline function. In Figure 23-4, the
likelihood ratio p value is shown as
, which is scientific notation for
,
indicating a model that includes the CenterCD and Radiation variables can predict survival
statistically significantly better than just the overall (baseline) survival curve.
Akaike’s Information Criterion (AIC) is especially useful for comparing alternative models but is
not included in Figure 23-4.
Focusing on baseline survival and hazard functions
The baseline survival function is represented as a table with two columns — time and predicted
survival — and a row for each distinct time at which one or more events were observed.
The baseline survival function’s table may have hundreds of rows for large data sets, so
instead of printing it, you should save the table as a data file. Then, you can use it to generate a
customized prognosis curve (described in the next section) for any specific set of values for the
predictor variables.
The software may also offer a graph of the baseline survival function. If your software is using an
average-participant baseline (see the earlier section, “The steps to perform a PH regression”), this
graph is useful as an indicator of the entire group’s overall survival. But if your software uses a zero-
participant baseline, the curve is not helpful.
How Long Have I Got, Doc? Constructing
Prognosis Curves
A primary reason to use regression analysis is to predict outcomes from any particular set of predictor
values. For survival analysis, you can use the regression coefficients from a PH regression along with
the baseline survival curve to construct an expected survival (prognosis) curve for any set of predictor
values.
Suppose that you’re survival time (from diagnosis to death) for a group of cancer patients in which the
predictors are age, tumor stage, and tumor grade at the time of diagnosis. You’d run a PH regression on
your data and have the program generate the baseline survival curve as a table of times and survival